Skip to content

feat: Add SmolLM2 browser-based LLM inference via WebAssembly#2

Merged
konard merged 13 commits intomainfrom
issue-1-bef2b5bd2f4e
Dec 30, 2025
Merged

feat: Add SmolLM2 browser-based LLM inference via WebAssembly#2
konard merged 13 commits intomainfrom
issue-1-bef2b5bd2f4e

Conversation

@konard
Copy link
Contributor

@konard konard commented Dec 29, 2025

Summary

This PR implements a proof-of-concept for running SmolLM2 language model entirely in the browser without any server-side processing, using WebAssembly for ML inference.

Key Features

  • Browser-based LLM Inference: SmolLM2-135M-Instruct model runs entirely client-side via WebAssembly
  • Rust/WASM Core: Uses HuggingFace Candle ML framework compiled to WebAssembly for efficient inference
  • Web Worker Architecture: Model inference runs in background worker, keeping UI responsive
  • React Chat UI: Modern chat interface using @chatscope/chat-ui-kit-react
  • Streaming Responses: Token-by-token streaming for real-time AI responses
  • GitHub Pages Deployment: Automatic deployment workflow

Technical Implementation

  1. wasm/ - Rust library compiled to WebAssembly

    • Uses candle-core, candle-nn, candle-transformers for SmolLM2 inference
    • Tokenizer support via HuggingFace tokenizers crate
    • Streaming token generation with callback
  2. web/ - React/TypeScript frontend

    • Vite-based build system
    • Web Worker for background model processing
    • Progress tracking for model downloads (~270MB)
  3. server/ - Local Axum dev server for testing

    • Serves static files and WASM with proper MIME types
    • CORS support for development

Files Changed

  • Added WASM inference library in wasm/
  • Added React chat UI in web/
  • Added local dev server in server/
  • Added GitHub Pages deployment workflow
  • Updated root Cargo.toml for workspace

Test Plan

  • WASM compilation succeeds with all required features (bulk-memory, SIMD, etc.)
  • CI/CD Pipeline passes (lint, tests on Linux/macOS/Windows)
  • GitHub Pages build workflow passes
  • TypeScript compilation succeeds without errors

Manual Testing

  1. Run ./scripts/dev.sh to start local server
  2. Open http://localhost:3030 in browser
  3. Click "Load Model" button
  4. Wait for ~270MB model download
  5. Send a message and observe streaming response

Closes #1

🤖 Generated with Claude Code

Adding CLAUDE.md with task information for AI processing.
This file will be removed when the task is complete.

Issue: #1
@konard konard self-assigned this Dec 29, 2025
This PR implements a proof of concept for running the SmolLM2-135M language
model directly in the browser using WebAssembly, with no server-side processing.

Key features:
- Rust WASM library using Candle ML framework for model inference
- Web Worker for background processing to keep UI responsive
- React chat UI using @chatscope/chat-ui-kit-react
- Local Rust development server with CORS support
- GitHub Pages deployment workflow
- Streaming token generation for real-time responses

Architecture:
- wasm/: Rust WASM bindings for SmolLM2 inference
- web/: React frontend with TypeScript
- server/: Local development server with Axum

Fixes #1

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@konard konard changed the title [WIP] Make a proof of concept library and webpage (for GitHub Pages) and local rust server runnable which are able to run small GPT model in the browser without any installation on client side (no server processing) feat: Add SmolLM2 browser-based LLM inference via WebAssembly Dec 29, 2025
konard and others added 10 commits December 29, 2025 14:01
…pace

This fixes the wasm-pack build error where it thought it should be part
of the parent workspace. An empty [workspace] table explicitly declares
this package as its own standalone workspace.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The getrandom 0.3.x crate requires explicit configuration for the
wasm32-unknown-unknown target. This adds:
- .cargo/config.toml with rustflags to enable wasm_js backend
- Updated Cargo.toml comments explaining the configuration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Both getrandom 0.2.x and 0.3.x are needed by different dependencies.
Added explicit dependency on getrandom 0.3 with wasm_js feature
to ensure proper WASM compilation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added Cache field to SmolLM2Model struct
- Initialize cache during model loading
- Pass cache to forward() method as required
- Fix dim() error handling with explicit map_err

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
These features are required for the wasm-bindgen output to be valid.
Bulk memory operations are used by the generated WASM code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
wasm-pack doesn't always respect .cargo/config.toml settings.
Setting RUSTFLAGS environment variable directly in the workflow
ensures the bulk-memory feature is enabled during compilation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Rust 1.87 / LLVM 20 generates bulk memory operations by default for
wasm32-unknown-unknown targets. This causes wasm-opt to fail validation
with error: "Bulk memory operations require bulk memory [--enable-bulk-memory]"

Add wasm-pack profile configuration to pass --enable-bulk-memory and
--enable-mutable-globals flags to wasm-opt during the optimization step.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Candle ML framework uses SIMD operations for optimized tensor computations.
Add --enable-simd flag to wasm-opt and +simd128 target feature to Rust
compiler flags to properly support these operations in WebAssembly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Rust 1.87+ / LLVM 20 generates modern WebAssembly features by default:
- nontrapping-float-to-int: For i32.trunc_sat_* saturating conversions
- sign-ext: For sign extension operations
- reference-types: For reference type operations

Add these flags alongside existing bulk-memory, mutable-globals, and simd
flags to pass wasm-opt validation successfully.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
TypeScript's strict mode requires an intermediate cast to unknown when
converting between types that don't overlap. The WASM module's generated
types don't exactly match our SmolLM2Wasm interface.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@konard konard marked this pull request as ready for review December 29, 2025 13:48
@konard
Copy link
Contributor Author

konard commented Dec 29, 2025

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Public pricing estimate: $13.889302 USD
  • Calculated by Anthropic: $10.940544 USD
  • Difference: $-2.948758 (-21.23%)
    📎 Log file uploaded as GitHub Gist (1275KB)
    🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard konard merged commit 07b3168 into main Dec 30, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant